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Abstract 

We study a parametric estimation problem related to moment con- 
dition models. As an alternative to the generalized empirical like- 
lihood (GEL) and the generalized method of moments (GMM), a 
Bayesian approach to the problem can be adopted, extending the 
MEM procedure to parametric moment conditions. We show in par- 
ticular that a large number of GEL estimators can be interpreted as a 
maximum entropy solution. Moreover, we provide a more general field 
of applications by proving the method to be robust to approximate 
moment conditions. 



1 Introduction 

We consider a parametric estimation problem in a moment condition 
model. Assume we observe an i.i.d. sample drawn from an un- 

known probability measure /iq, we are interested in recovering a parameter 
00 & Q C M''-, defined by a set of moment conditions 

^{eo,x)dfio{x) = 0, (1) 

where $ : x A" — )■ M'^ is a known map. This model is involved in many 
problems in Econometry, notably when dealing with instrumental variables. 
We refer to |(M7j . |Han82j . |QL94| , |()we91] and |DINn9j . Two main ap- 
proaches to the problem have been studied in the literature, namely the 
generalized method of moments (GMM) and the generalized empirical likeli- 
hood (GEL). While the main advantage of GMM relies in its computational 
feasibility, likelihood-related methods have appeared to be the most efficient 
in term of small-sample properties. In its original form, the empirical likeli- 
hood (EL) of Owen [OweQlj defines an estimator by a maximum likelihood 
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procedure on a discretized version of the model. As an alternative, GEL 
replaces the KuUback criterion relative to EL by a /-divergence, thus pro- 
viding a large choice of solutions. A number of estimators corresponding to 
particular choices of /-divergences have emerged in the literature over the 
last decades, such as the exponential tilting (ET) of Kitamura and Stutzer 
|KS97j and the continuous updating estimator (CUE) of Hansen, Yeaton and 
Yaron |HHY96] . 

While an attractive feature of GEL is its wide range of solutions, a number 
of /-divergence used in the computation of the GEL estimators are mainly 
justified by empirical studies and lack a probabilistic interpretation. This 
issue can be solved by incorporating some prior information to the prob- 
lem using a Bayesian point of view, as made in |PR94j . In this paper, we 
investigate a different Bayesian approach to the inverse problem, known as 
maximum entropy on the mean (MEM). Although the method was originally 
introduced in the frame of exact moment condition models (as opposed here 
to parametric moment conditions), it appears to provide a natural solution 
to the problem, expressed as the minimizer of a convex functional on a set 
of discrete measures and subject to linear constraints. When applied in a 
particular setting, we show that the MEM approach leads to a GEL solution 
for which the /-divergence is determined by the choice of the prior. As a 
result, the method gives an alternate point of view on some widely spread 
estimators such as EL, ET or CUE, as well as a general Bayesian background 
to GEL. 

In many actual situations, the true moment condition is not exactly 
known to the statistician and only an approximation is available. It occurs for 
instance when $ has a complicated form that must be evaluated numerically. 
Simulation-based methods have been implemented to deal with approximate 
constraints in |CF00j and |McF89j . in the frame of the generalized method 
of moments. To our knowledge, the efficiency of GEL in a similar situation 
has not been studied. In |LP08] . the MEM procedure is shown to be robust 
to approximate moment conditions, introducing the approximate maximum 
entropy on the mean estimator. Seeing GEL as a particular case of MEM, 
we extend the model in a situation where only an approximation of the 
true constraint function $ is available. We provide sufficient conditions un- 
der which the GEL method remains efficient asymptotically when replacing 
$ by its approximation. 

This paper falls into the following parts. Section [2] is devoted to the 
position of the problem. We introduce the maximum entropy method for 
parametric moment condition models and discuss its close relationship with 
generalized empirical likehhood in Section 12.21 In Section [31 we discuss the 
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asymptotic efficiency of the method when deahng with an approximate con- 
straint. Proofs are postponed to the Appendix. 

2 Estimation of the parameter 

Let X be an open subset of M'', endowed with its Borel field B{X) and 
let V{X) denote the set of probability measures on X. We observe an i.i.d. 
sample Xi,...,X„ drawn from the unknown distribution /iq. We want to 
estimate the parameter 9q E Q dW^ defined by the moment condition 

/ ^%,x)d^lo{x) = Q, (2) 

Jx 

where $ : x A" — ^ (A; > c/) is a known map. To avoid a problem of 
identifiability, we assume that 6q is the unique solution to This problem 
has many applications in Econometry, see for instance |Cha87] , |Han82j and 
|QL94| . The information given by the moment condition ([2]) can be inter- 
preted to determine the set of possible values for /xq (the model). The true 
value of the parameter being unknown, the distribution of the observations 
can be any probability measure yU for which the map ^ J ^{G, ■)dfi is null 
for a unique 6 = 6{fi) G O. The model is therefore defined as 

M = {fie V{X) : 3! = d{^i) eej $(^, .)dfi = 0} , 

where the map fi ^-^ 0{f^), defined on A^, is the parameter of interest. Let 
us introduce some notations and assumptions. For /i a measure and g a 
function, we shall note fi[g] = f gdfi. Let E be an Euclidean space and let 
||.|| denote an Euclidean norm in E. For a function f : Q ^ E and a set 
5 C 9, we note 

||/b = supl|/(^)||. 

We assume that the following conditions are fulfilled. 
A.l. B is a compact subset of M'^. 

A. 2. The true value 6o of the parameter lies in the interior of B. 

A. 3. For all X G Af, ^ I— 7- is continuous on B and the map x i— )■ 

||$(.,a;)||e is dominated by a yUo-integrable function. 

A. 4. For all X G A", ^ ^[6,x) is twice continuously differentiable in a 
neighborhood Af of 9q and we note V$(6', x) = d^{9, x) /d9 G R'^^^ and 
\E'(^,x) = d'^^{9,x)/d9d9^ G R'^^'^^'' (where a* stands for the transpose 
of a). Moreover, we assume that X I—)- || V$(., x) H^r and x i— )■ ||\E'(., x) Hat 
are dominated by a /iQ-integrable function. 
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A. 5. The matrices 

D = [ V$(0o, O^^Aio e M"^^^' and V 
are of full rank. 

Some issues for estimating 6*0 may be due to the indirect definition of the 
parameter and these assumptions ensure that the map 9{.) is sufficiently 
smooth in a neighborhood of /iq for the total variation topology, which will 
make the asymptotic properties of the GEL estimator easily tractable (see 
for instance [N^Hi] ). 



Jx 



2.1 Generalized empirical likelihood 

Generalized empirical likelihood (GEL) was first applied to this problem 
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QL94] , generalizing an idea of jOweOlj . An estimate /}. of is obtained 
as an entropic projection of the empirical measure P„ onto the model M.. 
Precisely, for two probability measures /x and z/ and / a convex function such 
that /(I) = /'(I) = 0, define 

Vf{u\ji) ~ J f (y^^ if ^ /i, l^fi^lfJ') = +00 otherwise. 

Moreover, we define for A C V{X), Vf{A\iJ,) = inf^g^ ©/(z/j/i). The GEL 
estimator jl of /io is the element of the model that minimizes a given /- 
divergence I^/(., Pn) with respect to the empirical distribution. Noticing that 
M = UggeA^e where Me := {/i € V{X) : /i[$(6', .)] = 0}, the GEL estimator 
6 = 9{fi) follows by 

6 = argmin VfiMe^fn)- 

0e0 

Since the set of discrete measures in Aig is closed and convex, the entropy 
Vf{J^g,Fn) is reached for a unique measure fi{9) in A^e, provided that 
Vf{Aig,Fn) is finite. Then, it appears that computing the GEL estima- 
tor involves a two-step procedure. First, build for each 6 E Q, the entropic 
projection fi{9) of P„ onto Aig. Then, minimize Vf{fl{9),Fn) with respect to 
6. Since fi{6) is absolutely continuous w.r.t. P„ by construction, minimizing 
Vf{.,Fn) reduces to finding the proper weights pi,...,pn to allocate to the 
observations Xi, This turns into a finite dimensional problem, which 

can be solved by classical convex optimization tools (see for instance [KitOGj ). 
In fact, the GEL estimator 6 can be expressed as the solution to the saddle 
point problem 

^ = argmin sup 7 - E„ [/* (7 + A*$(^, .))] , 

(7,A)eIRxIR'= 
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where f*{x) = supy {xy — f{y)} denotes the convex conjugate of /. 

Note that if the choice of the /-divergence plays a key role in the construc- 
tion of the estimator, it has no influence on its asymptotic efficiency. Indeed, 
it is shown in |QL94| that all GEL estimators are asymptotically efficient, re- 
gardless of the /-divergence used for their computation. Nevertheless, some 
situations justify the use of specific /-divergences. The empirical likelihood 
estimator introduced by Owen in |Uwe91j uses the Kullback entropy /C(., .) 
as /-divergence, pointing out that minimizing /C(.,P„) reduces to maximiz- 
ing likelihood among multinomial distributions. Newey and Smith |NS04] 
remark that a quadratic /-divergence leads to the CUE estimator of Hansen 
Heaton and Yaron |HHY96] . 

2.2 Maximum entropy on the mean 

In this section, we study a Bayesian approach to the inverse problem, 
known as maximum entropy on the mean (MEM) |GG97j . The method was 
developed to estimate a measure Hq based the observation of some of its 
moments. In this framework, it turns out that the MEM estimator of Hq 
can be used to estimate efficiently the parameter 6q. We shall briefly recall 
the MEM procedure. Consider an estimator of /xq in the form of a weighted 
version of the empirical measure Fn, 

1 " 

P„,(w) = -y^ Wi 6x,, 

i=l 

for w = {wi, ...jWn)' € a collection of weights. Then, fix a prior distribu- 
tion uq on the vector of weight w so that each solution P„(w) can be viewed 
as a realization of the random measure P„(Vr), where W is drawn from uq. 
This setting enables to incorporate some prior knowledge on the shape or 
support of /io through the choice of the prior uq, as discussed in |GG97] . 
Here, the observations are considered fixed. Actually, it is the 

moment condition that is used to built the estimator a posteriori. In this 
framework where the true value of the parameter is unknown, the infor- 
mation provided by the moment condition reduces to the statement fiQ & J^. 
So, in order to take this information into consideration, the underlying idea 
of MEM is to build the estimator fi as the expectation of P„(H^) conditionally 
to the event {P„(Vr) & Ai}. However, we may encounter some difficulties if 
this conditional expectation is not properly defined. To deal with this issue, 
the MEM method replaces the possibly ill-defined conditional expectation by 
a well-defined estimator, whose construction is motivated by large deviation 
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principles. Precisely, construct the posterior distribution u* as the entropic 
projection of uq onto the set 

u{M) = {fie [P„(iy)] G M} , 

where [P„(iy)] denotes the expectation of P„(iy) when W has distribution 
/X. The MEM solution to the inverse problem is defined as the expectation 
of P,i(Vr) under the posterior distribution u*, 

/i = E,. [P„(l^)]=P„(E,,(iy)). 

This construction is justified by the large deviation principle stated in The- 
orem 2.3 in |GG97j . which establishes the asymptotic equivalence between 
fi and the conditional expectation Ei^f,(P„(P1/)| P„(P1/) G Ai), whenever it 
is well defined. The existence of the MEM estimator requires the problem 
to be feasible in the sense that there exists at least one solution 6 in the 
interior of the convex hull of the support of z/q, such that Pn(5) G Ai. This 
assumption warrants that the set 11 (A^) is non-empty and therefore allows 
the construction of the posterior distribution u*. 

The MEM estimator fi lies in the model Ai by construction. As a result, 
there exists a solution 6 to the moment condition fl[^{6, .)] = 0. this solution 
is precisely the MEM estimator of 6q. In Theorem 12.11 below, we give an 
explicit expression for the MEM estimator 6. We note 1 = (1, 1)* G M"", 
$(^, X) = Xi), $(^, Xn)y G and as previously, denotes the 
log-Laplace transform of u. 

Theorem 2.1 // the problem is feasible, the MEM estimator 9 is given by 
^ = argmin sup {n^ - Ky^^^l + ^{6,X)\)} . 

(7,A)6KxK'= 

In particular, if Uq has equal orthogonal marginals, i. e. Uq = z/®" for some 
probability measure v on M, then 

^ = argmin sup {7 - P„ [A^(7 + A*$(^, .))] } . 

(7,A)eKx]R'= 

The MEM estimator 9 can be expressed as the solution to a saddle point 
problem, specific to generalized empirical likelihood. Actually, this result 
points out that maximum entropy on the mean with a particular form of prior 
= I/®" leads to a GEL procedure, for which the criterion is the log-Laplace 
transform of u. This approach provides a general Bayesian interpretation of 
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GEL. Regularity conditions on the criterion Aj, in the GEL framework are 
reflected through conditions on the prior v. Indeed, the usual normalization 
conditions A'j,(0) = A"(0) = 1 corresponds to taking a prior v with mean and 
variance equal to one, while the normalization Aj,(0) = is imposed by the 
condition v G 'P(M). 

An interesting choice of the prior is the exponential distribution dv{x) = 
e~^dx for x > 0. Indeed, observe that if the Wi are i.i.d. with exponential 
distribution, the likelihood of P„(l^) is constant over the set of probability 
discrete measures {P„(w) : ti'j = n}. Hence, an exponential prior can 
be roughly interpreted as a non- informative prior in this framework. The 
discrepancy associated to this prior is A^{s) = — log(l — s), s < 1, which 
corresponds to the empirical likelihood estimator of Owen |Uwe91j . 

The MEM approach also provides a new probabilistic interpretation of 
some commonly used specific GEL estimators. The exponential tilting of 
Kitamura and Stutzer |KS97j is obtained for a Poisson prior of parameter 
1, for which we have A^{s) = e'^ — 1. Another example is the Gaussian 
prior z/ ~ A/'(l, 1), leading to the continuous updating estimator of Hansen, 
Yeaton and Yaron |HHY96j . as we have in this case A,^(s) = |(s — 1)^. The 
Gaussian prior allows the discrete measure P„(Vr) to have negative weights 
Wi and must be handled with care. Remark however that this is generally 
not an issue in practice since the solution fi is implicitly chosen close to the 
empirical distribution P„ and will have all its weights Wi positive with high 
probability. More examples of classical priors leading to usual discrepancies 
can be found in |GG97j . 

3 Dealing with an approximate operator 

In many actual applications, only an approximation of the constraint 
function $ is available to the practitioner. This occurs for instance if the 
moment condition takes a complicated form that can only be evaluated nu- 
merically. In |McF89] . McFadden suggested a method dealing with approxi- 
mate constraint in a similar situation, introducing the method of simulated 
moments (see also |GF00j l In |LP08j and |LR09j . the authors study a MEM 
procedure for linear inverse problems with approximate constraints. Here, we 
propose to extend the results of |LP08] and |LR09j to the GEL framework, 
using the connections between GEL and MEM. 

We assume that we observe a sequence {$m}mgN of approximate con- 
straints, independent with the original sample Xi, and converging to- 
ward the true function $ at a rate ipm- We are interested in exhibiting 
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sufficient conditions on tfie sequence {$m}meN under wfiidi estimating Oq by 
tfie GEL procedure remains efficient wfien tlie constraint is replaced by its 
approximation. We discuss tlic asymptotic properties of tlie resulting esti- 
mates in a framework where both indices n and m simultaneously grow to 
infinity. 

The approximate estimator is obtained by the GEL methodology, replacing 
the constraint function $ by its approximation 

el = argmin sup {7 - P„ [A(7 + A*$„(e, .))] } , (3) 

(7,A)eKxR* 

where A : R — >■ M is a strictly convex, twice differentiable function such that 
A'(0) = A"(0) = 1 and A(0) = 0. As previously, the existence of 9m requires 
the feasibility condition that the supremum of 7 — P„ [A(7 + A*$m(^) •))] 
reached for a finite value of (7, A) e M x M'^, for at least one value of 6* e 
©. This condition rehes essentially on the domain of A being sufficiently 
widespread. We make the following additional assumptions. 

A. 6. The functions x 1-^ ||$(., a;)||0, x 1-^ || V$(., x) and x ||^'(., xjll^/" 
are dominated by a function k such that J K'^{x)djiQ{x) < 00. 

A.7. For allx E X and for sufficiently large m, the map 6 1— ?■ •) is twice 

continuously differentiable in J\f and we note V$f„(6', .) = 9$m(6', .)/d9 
and ^„,(^, .) = d'^^miO, .)/dede\ 

A.8. The functions a; ||$„,(.,a;)-$(.,a;)||e, x i-)- ||V$m(.,a;)-V$(.,.x)||_v 
and X I— 7- ||\l'm(-,a^) — are dominated by a function n^n such 

that j Kl^{x)dno{x) = 0{(f-^). 

A. 9. The function A" is bounded by a constant K < 00. 

Assumptions A. 6 to A. 8 are made to obtain a uniform control over \\9m — d\\ 
for all n G N. The condition A. 9 implies that A is dominated by a quadratic 
function. In the MEM point of view, this condition is fulfilled for the log- 
Laplace transform A^, of sub-Gaussian priors u. 

Theorem 3.1 (Robustness of GEL) If Assumptions 1 to 9 hold, 

n\\L-ef = Op{niPm^) + op{l). 

Moreover, 9^ is ^/n- consistent and asymptotically equivalent to the GEL 
estimator computed with exact constraint function $ whenever mp^ tends to 
zero. 
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By considering a situation with approximate operator, we extend the GEL 
model to a more general framework that gives a more realistic formulation 
of actual problems. The previous theorem gives an upper bound of the error 
caused by the use of the approximation $m in place of the true function $. 
By this result, we aim to provide an insight on convergence conditions that 
are necessary for asymptotic efficiency when dealing with an approximate 
operator. 



4 Proofs 

4.1 Proof of Theorem 12.11 

Let Sg = {w e R"" : F„{w) G Mg} and jr^ = {/i G P(M") : E^{W) = 
w}. We use that inf^^j-^ /C(/i, z/q) = A*„(w) (see |GG97] l Let U{Me) = 
{fi G V{W), [P„(Vr)] G Me}, we have the equality 

^ = argmin inf /C(u, z/n) = argmin inf inf lC{Li,Vn), 

6»Ge ^l&Il{Me) fee weSe mg-F™ 

which can be written 

9 = argmin inf A* (w) = argmin inf sup {r*w — A^^i^r)}. 

The feasibility assumption warrants that the extrema are reached. Hence, 
using Sion's minimax Theorem, we find 

9 = argmin sup inf {r^w — Ai,p(r)}, 

We know that w = (i^i, G Sg if and only if ^"^^ = n and 
J2i=i ^j'^l^j ^i) = 0- Thus, for a fixed value of r, the map w i— )■ t^w — A^^^i^r) 
can be arbitrarily close to — oo on Sg whenever r is not orthogonal to 1 
and $(0,X). As a result, we may assume that r = 7I + ^{9,X)X for 
some (7, A) G M X M'^ without loss of generality. In this case, the map 
w I—)- T^w — A^q{t) is constant over Sg, equal to n7 — A^^^'yl + $(6',X)A), 
which ends the proof. If z/q = z^®", then A^g{w) = X^ILi ^i^('^i) 
conclude easily. 

4.2 Proof of Theorem [331 

The proof of the results relies mainly on the uniform law of large numbers, 
using that the set {||$^(^, .)||, || V$„(0, .)||, .)||, 0Ge,mGN} is a 
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Glivenko-Cantelli class of functions, consequently to A. 6 and A. 8. For all 

e eQ, V eR\ X e X, let 

. (n. _ ( P„[$(^,.)AV*(^,-))] 
rinKo.v) - p^[^tv<l>*(e,.)A'(t;*$(^,.))] 



hm,n{^,v) 



p„ [^m{e,.)k\v'^m{e,.))] 



The pair {9m-,Vm) (resp. (^,"5)) is defined as the unique zero over 6 x 
of h^ n (resp. /i„). The condition A. 9 implies that there exists a constant 
i^' > such that A'(s) < Ks + 1 for all s G M. Hence, using successively 
the mean value theorem and Cauchy-Schwarz's inequality, we show that the 
contrast function hm,n converges uniformly on every compact set toward hn 
as m — )■ oo, which warrants the convergence of {Om,Vm) toward (^,'0). For 
all f G M^, the application 9 ^ Vhm,n{9, v) is continuous in a neighborhood 
on 6^ for sufficiently large values of m by the condition A. 7, as explicit 
calculation gives 

^m,n(^,^) D^mA^^v) 

where 

D^A^, v) = P„ [W^miO, ■)A'iv'^UO, •)) + V$^(^, .)v¥Ae, .)A"iv'^AO, •))] 

We define in the same way An{9,v), Dn{9,v) and Vn{0,v) by replacing $m 
by $ in the expressions above. Using Cauchy-Schwarz's inequality, A. 8 en- 
sures the uniform convergence of V/i^.n toward V/?.„ on every compact set 
at the rate ipm- Note pn the smallest eigenvalue of Vhn{0,v), we know from 
Theorem 3.2 in [NS04] that P(pri >"'?) = o{n~^) for sufficiently small 1] > 0, 
since A. 5 ensures that the limit of Vhn{0,v) as n — )■ oo is positive definite. 
Thus, for c > sufficiently small, consider the event fl = {pn > c}. Writing 
the Taylor expansion 

/ Q g 



we deduce that on 



^ ^ ^V/l„(^, V)] ' hm,n{^, V) + Op{^-'] 



Vm. - V 
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The Schur complement formula gives in particular 

-1 



+ Op((^-^) + op(n-^), 



where /)„ = Dn{9, v) and Vn = Vn{9, v) and where we used that v = Op{n 
(see for instance Theorem 3.2 in |NS04j ). Thus, on the event f2, 



\Orr, -9\\<c 



By construction, P„[$(^, .)A'(-0*<l>(^, .))] = 0, which yields 

P„[$„(^,.)A'(i)*$„.(^,.))] 

||(<l>^(^,.)-$(^,.))A'(t)*$™(^,.))ll 
+ ||<|.(^,.)[A'(t;*$„(^,.)-A'({)*<|.(^,.))]| 



< P. 



< KWvW P. 



+ P„||$r 



p„ 



as a consequence of A. 9. We conclude that ||^m — ^Ipln = Op{(p^)+op{n~'^) 
by the condition A. 8. On the complement of fi, \\9m — can be bounded by 
the diameter 5 of 9, yielding — 6'||lnc = op{n~^), which ends the proof. 
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